display(aud1)
fig.show()
fig.show()
fig.show()
fig.show()
fig.show()
fig.show()
fig.show()
An ML model can randomly sample datapoints and send them to the oracle for labeling. Random sampling will also eventually result in capturing the global distribution of the dataset in the train datapoints. However, active learning aims to improve the model by intelligently selecting the datapoints for labeling. Thus, Random sampling is an appropriate baseline to compare with active learning.
Least confident: In this method, we choose samples for which the most probable class's probability is minimum.
Margin sampling: In this method, we choose samples for which the difference between the probability of the most probable class and the second most probable class is minimum.
Entropy: Entropy can be calculated for N number of classes using the following equation, where $P(x_i)$ is predicted probability for $i^{th}$ class. \begin{equation} H(X) = -\sum\limits_{i=0}^{N}P(x_i)log_2P(x_i) \end{equation}
fig.show()
anim
fig.show()
fig.show()
anim
fig.show()
anim
Separation boundaries between different colors are decision boundaries in Animation 3. Points queried by the committee are the points where the learners disagree the most. This can be observed from the above plot. We can see that initially, all models learn different decision boundaries for the same data. Iteratively they converge to a similar hypothesis and thus start learning similar decision boundaries.
We now show the comparison of the overall F1-score between random sampling and our model. QBC, most of the time, outperforms the random sampling method.
# fig, ax = plt.subplots(figsize=(12,4))
# ax.plot(range(1,1+len(rand_pred_all_iris)), overall_acc(rand_pred_all_iris), label='Random baseline', color=my_clr['l_b'])
# ax.plot(range(1,1+len(rand_pred_all_iris)), overall_acc(list_pred_all_iris), label='QBC',color=my_clr['l_r'])
# ax.legend();ax.set_xlabel('Iterations');ax.set_ylabel('Overall F1-score');
# ax.set_ylim(0,1)
# plt.figtext(0.2,-0.1,'Comparison between QBC and random baseline on Iris dataset',fontdict={'size':16});
# format_axes(ax);
# plt.xticks(np.arange(1,1+len(rand_pred_all_iris),2));
# Create traces
layout = Layout(
paper_bgcolor='rgb(255,255,255)',
plot_bgcolor='rgb(255,255,255)'
)
fig = go.Figure(layout=layout)
fig.add_trace(go.Scatter(x=list(range(1,1+len(list_pred_all_iris))), y=overall_acc(list_pred_all_iris),
mode='lines+markers',
name='Query by committee',
line=dict(width=2,color=px.colors.DEFAULT_PLOTLY_COLORS[1]),
hovertemplate='(%{x:.2f},%{y:.2f})'))
fig.add_trace(go.Scatter(x=list(range(1,1+len(rand_pred_all_iris))), y=overall_acc(rand_pred_all_iris),
mode='lines+markers',
name='Random sampling',
line=dict(width=2,color=px.colors.DEFAULT_PLOTLY_COLORS[0]),
hovertemplate='(%{x:.2f},%{y:.2f})'))
############# Common
fig.update_yaxes(automargin=True,gridcolor='rgba(128,128,128,0.2)',gridwidth=1,zerolinecolor='rgba(128,128,128,0.2)',zerolinewidth=1)
fig.update_xaxes(automargin=True,gridcolor='rgba(128,128,128,0.2)',
gridwidth=1,zerolinecolor='rgba(128,128,128,0.2)',
zerolinewidth=1,tickvals=list(range(1,31)))
fig.update(layout_coloraxis_showscale=False)
fig.update_layout(title_text='<b>Figure 14:</b> Comparison between QBC and Random baseline on Iris dataset',
title_x=0.5,
xaxis_title='Iterations',
yaxis_title='Overall F1-score',
#font=dict(family="Courier New")
)
fig['layout']['xaxis'].update(side='bottom')
fig.show()
anim
fig.show()
fig.show()